An Adaptive Algorithm for Tolerating Value Faults and Crash Failures
نویسندگان
چکیده
The AQuA architecture provides adaptive fault tolerance to CORBA applications by replicating objects and providing a high-level method that an application can use to specify its desired level of dependability. This paper presents the algorithms that AQuA uses, when an application’s dependability requirements can change at runtime, to tolerate both value faults in applications and crash failures simultaneously. In particular, we provide an active replication communication scheme that maintains data consistency among replicas, detects crash failures, collates the messages generated by replicated objects, and delivers the result of each vote. We also present an adaptive majority voting algorithm that enables the correct ongoing vote while both the number of replicas and the majority size dynamically change. Together, these two algorithms form the basis of the mechanism for tolerating and recovering from value faults and crash failures in AQuA.
منابع مشابه
Reliability and Performance Evaluation of Fault-aware Routing Methods for Network-on-Chip Architectures (RESEARCH NOTE)
Nowadays, faults and failures are increasing especially in complex systems such as Network-on-Chip (NoC) based Systems-on-a-Chip due to the increasing susceptibility and decreasing feature sizes. On the other hand, fault-tolerant routing algorithms have an evident effect on tolerating permanent faults and improving the reliability of a Network-on-Chip based system. This paper presents reliabili...
متن کاملA weakly-adaptive condition-based consensus algorithm in asynchronous distributed systems
The consensus problem is one of the most fundamental and important problems for designing fault-tolerant distributed systems. In the consensus problem, each process proposes a value, and all non-faulty processes have to agree on a common value that is proposed by a process. Despite of many applications (e.g., atomic broadcast [2, 6], shared object [1, 7], weak atomic commitment [5]), it is know...
متن کاملAn Algorithm for Tolerating Crash Failures in Distributed Systems
In the framework of the ESPRIT project 28620 “TIRAN” (tailorable fault tolerance frameworks for embedded applications), a toolset of error detection, isolation, and recovery components is being designed to serve as a basic means for orchestrating application-level fault tolerance. These tools will be used either as stand-alone components or as the peripheral components of a distributed applicat...
متن کاملOne Terminal Digital Algorithm for Adaptive Single Pole Auto-Reclosing Based on Zero Sequence Voltage
This paper presents an algorithm for adaptive determination of the dead timeduring transient arcing faults and blocking automatic reclosing during permanent faults onoverhead transmission lines. The discrimination between transient and permanent faults ismade by the zero sequence voltage measured at the relay point. If the fault is recognised asan arcing one, then the third harmonic of the zero...
متن کاملImplementing Fault-Tolerant Services Using State Machines: Beyond Replication
This paper describes a method to implement fault-tolerant services in distributed systems based on the idea of fused state machines. The theory of fused state machines uses a combination of coding theory and replication to ensure efficiency as well as savings in storage and messages during normal operations. Fused state machines may incur higher overhead during recovery from crash or Byzantine ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- IEEE Trans. Parallel Distrib. Syst.
دوره 12 شماره
صفحات -
تاریخ انتشار 2001